The main goals for 8/22-9/8:
look at correlations of environmental variables
Work on fixing Edna’s code
Here is what the old Synechococcus oligotyping results look like:
To check the sensitivity of the Synechococcus oligotyping results, I subsampled the data in mothur to 5,000 reads and made a new fasta file to oligotype. Using this cutoff I was able to normalize the total number of reads per sample (including chloroplasts). I would have to subsample down to 500 or 1000 if I wanted to normalize bacterial reads per sample.
These look identical … hm maybe i did something wrong? But no, the original fasta of synechococcus reads has 325,410 lines (1432 unique) and the subsampled fasta has 323,838 lines (476 unique). So the subsampling definitely worked – it got rid of the rare reads assigned to synechococcus (which are probably filtered out in the MED pipeline)
That talk title you sent me said that Synechococcus has higher diversity in less saline systems and that there is a negative correlation with total N. The inverse correlation with N does not seem to be the case in our system, because if anything Synechococcus blooms more during the earlier part of the season when N is high. Lake Erie is fairly saline for a freshwater lake, but obviously much less so than the Baltic.
Funny aside: if you search lake erie salinity in google one of the top results is this http://www.csmonitor.com/1982/0414/041450.html from Christian Science monitor
It’s unclear to me whether there is true ecological variation in Limnohabitans or whether the three Oligotypes just represent a 16s copy variant. The relative proportions of the three to each other don’t change much throughout the season.
In your email you said that limonhabitans is a typical copiotroph, reacting positively to algal bloom, but also to increased input of DOC after heavy rain events.
I made a bunch of scatter plots to look at relationships between limnohabitans abundance and other variables:
##
## Call:
## lm(formula = log(Abundance) ~ log(Nitrate))
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.73474 -0.71543 0.04435 0.61351 2.83915
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.64879 0.26722 2.428 0.0158 *
## log(Nitrate) 0.30085 0.05004 6.012 5.84e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.074 on 273 degrees of freedom
## (8 observations deleted due to missingness)
## Multiple R-squared: 0.1169, Adjusted R-squared: 0.1137
## F-statistic: 36.15 on 1 and 273 DF, p-value: 5.842e-09
##
## Call:
## lm(formula = log(Abundance) ~ log(Ammonia))
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.3768 -0.7432 0.1272 0.7609 2.5015
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.90038 0.09229 20.591 < 2e-16 ***
## log(Ammonia) 0.21272 0.04454 4.775 2.93e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.098 on 273 degrees of freedom
## (8 observations deleted due to missingness)
## Multiple R-squared: 0.07709, Adjusted R-squared: 0.07371
## F-statistic: 22.8 on 1 and 273 DF, p-value: 2.931e-06
##
## Call:
## lm(formula = log(Abundance) ~ log(N.P))
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.5421 -0.7690 0.1790 0.7567 2.5938
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.0509 0.3073 9.928 < 2e-16 ***
## log(N.P) -0.4447 0.1601 -2.778 0.00586 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.13 on 268 degrees of freedom
## (13 observations deleted due to missingness)
## Multiple R-squared: 0.02798, Adjusted R-squared: 0.02436
## F-statistic: 7.716 on 1 and 268 DF, p-value: 0.005861
##
## Call:
## lm(formula = log(Abundance) ~ log(SRP))
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.3035 -0.7993 0.1492 0.7526 2.9558
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.22635 0.07239 30.756 <2e-16 ***
## log(SRP) -0.05889 0.04003 -1.471 0.142
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.141 on 268 degrees of freedom
## (13 observations deleted due to missingness)
## Multiple R-squared: 0.008009, Adjusted R-squared: 0.004307
## F-statistic: 2.164 on 1 and 268 DF, p-value: 0.1425
##
## Call:
## lm(formula = log(Abundance) ~ log(POC))
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.69548 -0.72431 0.08871 0.72496 2.49203
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.29778 0.06285 36.558 < 2e-16 ***
## log(POC) -0.55752 0.06882 -8.101 1.84e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.026 on 273 degrees of freedom
## (8 observations deleted due to missingness)
## Multiple R-squared: 0.1938, Adjusted R-squared: 0.1908
## F-statistic: 65.62 on 1 and 273 DF, p-value: 1.835e-14
##
## Call:
## lm(formula = log(Abundance) ~ LogChla)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.88762 -0.62302 0.05095 0.75092 2.86467
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.57115 0.15860 22.517 <2e-16 ***
## LogChla -0.40093 0.04316 -9.289 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9958 on 273 degrees of freedom
## (8 observations deleted due to missingness)
## Multiple R-squared: 0.2402, Adjusted R-squared: 0.2374
## F-statistic: 86.29 on 1 and 273 DF, p-value: < 2.2e-16
## Warning in log(LogPhyco): NaNs produced
##
## Call:
## lm(formula = log(Abundance) ~ log(LogPhyco))
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.90274 -0.77604 0.03967 0.73097 2.72162
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.12768 0.09670 22.003 < 2e-16 ***
## log(LogPhyco) -0.32692 0.07478 -4.372 2.1e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.051 on 176 degrees of freedom
## (105 observations deleted due to missingness)
## Multiple R-squared: 0.09796, Adjusted R-squared: 0.09283
## F-statistic: 19.11 on 1 and 176 DF, p-value: 2.103e-05
##
## Call:
## lm(formula = log(Abundance) ~ log(Temp))
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.7871 -0.7293 0.1697 0.8387 2.5476
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.5358 0.7649 7.238 5.86e-12 ***
## log(Temp) -1.1025 0.2576 -4.280 2.68e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.082 on 245 degrees of freedom
## (36 observations deleted due to missingness)
## Multiple R-squared: 0.06957, Adjusted R-squared: 0.06577
## F-statistic: 18.32 on 1 and 245 DF, p-value: 2.681e-05
##
## Call:
## lm(formula = log(Abundance) ~ LogParMC)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.27348 -0.77399 0.07611 0.82739 2.27695
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.93079 0.08212 23.511 < 2e-16 ***
## LogParMC -0.25924 0.04211 -6.156 4.99e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.088 on 174 degrees of freedom
## (107 observations deleted due to missingness)
## Multiple R-squared: 0.1789, Adjusted R-squared: 0.1741
## F-statistic: 37.9 on 1 and 174 DF, p-value: 4.995e-09
The best fitting linear models were the ones between the abundance of limnohabitans and either Chlorophyll A or Particulate Microcystin. These relationships were both negative.
There were significant positive relationships between limnohabitans relative abundance and ammonia as well as nitrate.
It seems like Limnohabitans is responding more to environmental conditions, allochthonous carbon than DOC from the bloom. During that October 6th date, when Microcystis seems to disappear completely (large temperature drop, which was probably accompanied by a rain event), limnohabitans shoots up to almost 10% of the full community.
I didn’t run any correlations, but I read a bunch of papers on multivariate methods
mvabund - fits a separate GLM to each OTU using a common set of explanatory variables, BUTTT most variables we would expect to be unimodal not linear
Jamil et al.2015 Plos One used bayesian framework to link phytoplankton community data, env variables, and traits. They use a gaussian logistic model, with parameters (optimum, tolerance, max) that are linearly dependent on species traits. We could do this without the trait part for the HABs data. We could do it with the traits for Jeff’s data.
Ramette 2007 is a good overview of more familiar multivariate methods like ordination and db-RDA
“Because of their small genomes, marine picocyanobacteria possess a limited gene complement per cell (Table 1). Gene number ranges from 2,358 to 3,129 in Synechococcus to 1,716 to 3,022 in Prochlorococcus and with few paralogous genes. The high diversity of gene complement plus efficient horizontal gene transfer (213) suggests that marine picocyanobacteria conform to the distributed-genome hypothesis, i.e., that their full complement of genes exists in a “supragenome,” one that each member of the population contributes to and draws genes from; in other words, no single isolate contains the full complement of genes, resulting in a high degree of genomic variation (65). Thus, their supragenome (sometimes also called “pan-genome” [288]) is probably several orders of magnitude larger than the genome of any single strain and consists of a large set of noncore genes from which highly variable subsets of genes are brought together in various combinations and numbers to generate the specific gene complement of each strain or ecotype.”
Limnohabitans has three oligotypes - it’s unclear whether they are really different ecological units, or just a 16s copy variant. Limnohabitans is negatively impacted by Microcystis - it completely disappears during the main phase of the Microcystis bloom. Conversely it responds positively to
The chloroplast data is messy because there are hundreds of oligotypes. I pruned them down heavily to just 28 with an M parameter of 5000. A few trends stick out: - different oligotypes in 100um/53um vs 3um - Less diversity in 100um/53um (or is this sampling bias because these samples had lower yields and generally depth) - Overall, chloroplasts are a major contributor to every fraction - Cyan oligotype seems to be mainly present in WE4 - lime green is present mainly at begining and end of season when microcystis is not abundant - Grey oligotype coexists with MC, but appears to be negatively impacted?
Correlations and multivariate analyses see papers
Edna’s stuff: Didn’t want to work on this while Marian was gone. Didn’t have time last week, but will make it a priority over the next week